KmerStream: streaming algorithms for k -mer abundance estimation
نویسندگان
چکیده
منابع مشابه
KmerStream: streaming algorithms for k-mer abundance estimation
MOTIVATION Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment. RESULTS We present KmerStream, a streaming ...
متن کاملKmerlight: fast and accurate k-mer abundance estimation
k-mers (nucleotide strings of length k) form the basis of several algorithms in computational genomics. In particular, k-mer abundance information in sequence data is useful in read error correction, parameter estimation for genome assembly, digital normalization etc. We give a streaming algorithm Kmerlight for computing the k-mer abundance histogram from sequence data. Our algorithm is fast an...
متن کاملStreaming Algorithms for k-core Decomposition
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-Hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks...
متن کاملStreaming Algorithms with One-Sided Estimation
We study the space complexity of randomized streaming algorithms that provide one-sided approximation guarantees; e.g., the algorithm always returns an overestimate of the function being computed, and with high probability, the estimate is not too far from the true answer. We also study algorithms which always provide underestimates. We also give lower bounds for several one-sided estimators th...
متن کاملStreaming Algorithms for k-Means Clustering with Fast Queries
We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. When compared with the current state-of-the-art, our methods provide a substantial improvement in the time to answer a query for cluster centers, while retaining the desirable properties of provably small approximation error, and low space usage. Our algorithms are based on a no...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2014
ISSN: 1460-2059,1367-4803
DOI: 10.1093/bioinformatics/btu713